Skip to content

metrics: add /metrics endpoint and console_helm_install_count metric#10194

Merged
openshift-merge-robot merged 2 commits intoopenshift:masterfrom
zonggen:helm-metrics
Oct 28, 2021
Merged

metrics: add /metrics endpoint and console_helm_install_count metric#10194
openshift-merge-robot merged 2 commits intoopenshift:masterfrom
zonggen:helm-metrics

Conversation

@zonggen
Copy link
Copy Markdown

@zonggen zonggen commented Oct 6, 2021

Works with openshift/console-operator#601 to add
a console_helm_install_count counter vector metric. This change will
increment the counter each time a user installs a Helm chart in the
console.

Closes: https://issues.redhat.com/browse/HELM-235
Signed-off-by: Allen Bai abai@redhat.com

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 6, 2021
@kyoto
Copy link
Copy Markdown
Member

kyoto commented Oct 7, 2021

FYI @openshift/openshift-team-monitoring

@zonggen zonggen changed the title [WIP] metrics: add /metrics endpoint and console_helm_install_count metric metrics: add /metrics endpoint and console_helm_install_count metric Oct 7, 2021
@zonggen
Copy link
Copy Markdown
Author

zonggen commented Oct 7, 2021

Ready for review @dperaza4dustbit.

@openshift-ci openshift-ci Bot added kind/dependency-change Categorizes issue or PR as related to changing dependencies and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Oct 7, 2021
Comment thread pkg/helm/metrics/metrics.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Help: "Number of Helm installation from console.",
Help: "Number of Helm installations from console broken down by chart and version.",

Comment thread pkg/helm/metrics/metrics.go Outdated
Comment thread pkg/helm/metrics/metrics.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK the only way this function would panic is if you pass a wrong number of labels to WithLabelValues() which would mean that the code is broken somehow.
It's better to use GetMetricWithLabelValues or GetMetricWith() and check the returned error.

Comment thread pkg/helm/actions/install_chart.go Outdated
@zonggen
Copy link
Copy Markdown
Author

zonggen commented Oct 8, 2021

@simonpasquier Thanks for reviewing! Updated according to your comments.

Copy link
Copy Markdown
Contributor

@dperaza4dustbit dperaza4dustbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci Bot added lgtm Indicates that a PR is ready to be merged. and removed lgtm Indicates that a PR is ready to be merged. labels Oct 12, 2021
@zonggen zonggen force-pushed the helm-metrics branch 3 times, most recently from 280cf92 to 1da9214 Compare October 12, 2021 20:47
Copy link
Copy Markdown
Member

@jhadvig jhadvig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zonggen thanks for the PR. Adding couple of comments:
Also could you please squash your PR afterwards, at least to two commits - code changes & vendor.
Thanks

Comment thread vendor/OWNERS Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be removed

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added back.

Comment thread pkg/helm/metrics/metrics.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets make this Names consts here and use them here and in the metrics_test.go

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment thread pkg/helm/metrics/metrics.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets make these a consts as well and use them across the pkg

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated :)

Comment thread pkg/helm/metrics/metrics.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a native speaker but 'broken down by' sounds a bit odd to me
maybe?

Suggested change
Help: "Number of Helm installations from console broken down by chart name and version.",
Help: "Number of Helm installations from console, based on chart name and version.",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Number of Helm installations from console by chart name and version. should be good

Copy link
Copy Markdown
Author

@zonggen zonggen Oct 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with Number of Helm installations from console by chart name and version., please let me know if that sounds okay @jhadvig

Comment thread pkg/helm/metrics/metrics.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this wont recover the function since the GetMetricWithLabelValues throws a panic.
Instead

func HandleconsoleHelmInstallsTotal(chartName, chartVersion string) {
	defer recoverMetricPanic()
        klog.V(4).Infof("metric console_helm_installs_total: %s %s", chartName, chartVersion)
	counter, _ := consoleHelmInstallsTotal.GetMetricWithLabelValues(chartName, chartVersion)
        counter.Add(1)
}

// We will never want to panic console server because of metric saving.
// Therefore, we will recover our panics here and error log them
// for later diagnosis but will never fail the console server.
func recoverMetricPanic() {
	if r := recover(); r != nil {
		klog.Errorf("Recovering from metric function - %v", r)
	}
}

We should do the same for the HandleconsoleHelmUpgradesTotal & HandleconsoleHelmUninstallsTotal functions

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the official doc it seems like WithLabelValues will panic in the case when GetMetricWithLabelValues returns an error: https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#CounterVec.GetMetricWithLabelValues vs. https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#CounterVec.WithLabelValues.

Could you confirm the above doc and GetMetricWithLabelValues will indeed panic? Will proceed to update after confirming :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetMetricWithLabelValues won't panic. WithLabelValues would only panic if you give it an incorrect number of parameters (e.g. your metric declares 2 labels but you only passed 1 value).
In general, it's better to let WithLabelValues crash the program since it means that there's a programmatic error.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonpasquier you are right, WithLabelValues would only panic. Misread the code 👍

Comment thread pkg/helm/metrics/metrics_test.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are other RC (201+) OK ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems redundant for GET responses. Removed httpOK and explicitly check 200.

Comment thread pkg/server/middleware.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here Im not sure if this should be part of the middleware code. Since this serves only the purpose of the metrics endpoint. I think this should be part of the handler itself (maybe as a wrapper?)

Copy link
Copy Markdown
Author

@zonggen zonggen Oct 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this part to a new metricsHandler in server.go to preprocess the request and leave middleware.go unchanged.

@zonggen zonggen force-pushed the helm-metrics branch 2 times, most recently from 967308a to a9a8a42 Compare October 13, 2021 15:31
@zonggen
Copy link
Copy Markdown
Author

zonggen commented Oct 13, 2021

Thank you for the reviews @jhadvig!
Updated and squashed commits. Addressed most of them except the GetMetricWithLabelValues comment. PTAL.

@jhadvig
Copy link
Copy Markdown
Member

jhadvig commented Oct 13, 2021

@zonggen thanks for addressing the comments. Unfortunately the PR needs rebase due to #10224

Also I see that the backend CI test is failing:
Screenshot 2021-10-14 at 00 04 44

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 13, 2021
Comment thread pkg/helm/metrics/metrics.go Outdated
Comment on lines 13 to 14
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) I would suggest to shorten the label names since from the metric names already tell that it relates to console and Helm.

Suggested change
consoleHelmChartNameLabel = "console_helm_chart_name"
consoleHelmChartVersionLabel = "console_helm_chart_version"
consoleHelmChartNameLabel = "chart_name"
consoleHelmChartVersionLabel = "chart_version"

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated :)

Adds prometheus metrics dependencies for helm. Result of `go mod vendor`
and `go mod tidy`.

Signed-off-by: Allen Bai <abai@redhat.com>
Works with openshift/console-operator#601 to add
a console_helm_install_count counter vector metric. This change will
increment the counter each time a user installs a Helm chart in the
console.

Closes: https://issues.redhat.com/browse/HELM-235
Signed-off-by: Allen Bai <abai@redhat.com>

helm/metrics: update according to code reviews

Signed-off-by: Allen Bai <abai@redhat.com>

helm/metrics: add metrics for helm release upgrades and uninstalls

Signed-off-by: Allen Bai <abai@redhat.com>
@zonggen
Copy link
Copy Markdown
Author

zonggen commented Oct 14, 2021

/test e2e-gcp-console

1 similar comment
@zonggen
Copy link
Copy Markdown
Author

zonggen commented Oct 15, 2021

/test e2e-gcp-console

Copy link
Copy Markdown
Member

@jhadvig jhadvig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Oct 18, 2021
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Oct 18, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dperaza4dustbit, jhadvig, zonggen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 18, 2021
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@jhadvig
Copy link
Copy Markdown
Member

jhadvig commented Oct 19, 2021

/hold for approvals
/assign @yapei @opayne1 @RickJWagner

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 19, 2021
@opayne1
Copy link
Copy Markdown
Contributor

opayne1 commented Oct 19, 2021

/label docs-approved

@openshift-ci openshift-ci Bot added the docs-approved Signifies that Docs has signed off on this PR label Oct 19, 2021
@yapei
Copy link
Copy Markdown
Contributor

yapei commented Oct 26, 2021

Install/Uninstall Helm Releases and check the metrics data

$ oc -n openshift-console exec console-76cf6658f8-w9ghl -- curl -k -H "Authorization: Bearer $token" 'https://xxxx:8443/metrics' | grep "helm"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  7023    0  7023    0     0   403k      0 --:--:-- --:--:-- --:--:--  403k
# HELP console_helm_installs_total Number of Helm installations from console by chart name and version.
# TYPE console_helm_installs_total counter
console_helm_installs_total{chart_name="dotnet",chart_version="0.0.1"} 1
console_helm_installs_total{chart_name="infinispan",chart_version="0.2.0"} 1
console_helm_installs_total{chart_name="quarkus",chart_version="0.0.3"} 1
console_helm_installs_total{chart_name="vault",chart_version="0.17.0"} 1
# HELP console_helm_uninstalls_total Number of Helm release uninstallations from console by chart name and version.
# TYPE console_helm_uninstalls_total counter
console_helm_uninstalls_total{chart_name="infinispan",chart_version="0.2.0"} 1
console_helm_uninstalls_total{chart_name="vault",chart_version="0.17.0"} 1

@zonggen
Copy link
Copy Markdown
Author

zonggen commented Oct 26, 2021

A friendly bump @RickJWagner @yapei, is there anything else needed from my side to get this approved? :)

@yapei
Copy link
Copy Markdown
Contributor

yapei commented Oct 27, 2021

tested Install/Uninstall/Upgrade of Helm releases, working as intended

# oc -n openshift-console exec console-67f47f4469-4nmqn -- curl -k -H "Authorization: Bearer $token" 'https://10.130.0.37:8443/metrics' | grep "helm"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  7173    0  7173    0     0   437k      0 --:--:-- --:--:-- --:--:--  466k
# HELP console_helm_installs_total Number of Helm installations from console by chart name and version.
# TYPE console_helm_installs_total counter
console_helm_installs_total{chart_name="dotnet",chart_version="0.0.1"} 1
console_helm_installs_total{chart_name="jenkins",chart_version="0.0.3"} 1
console_helm_installs_total{chart_name="quarkus",chart_version="0.0.3"} 1
# HELP console_helm_uninstalls_total Number of Helm release uninstallations from console by chart name and version.
# TYPE console_helm_uninstalls_total counter
console_helm_uninstalls_total{chart_name="dotnet",chart_version="0.0.1"} 1
# HELP console_helm_upgrades_total Number of Helm release upgrades from console by chart name and version.
# TYPE console_helm_upgrades_total counter
console_helm_upgrades_total{chart_name="jenkins",chart_version="0.0.3"} 1
console_helm_upgrades_total{chart_name="quarkus",chart_version="0.0.3"} 1

/label qe-approved

@openshift-ci openshift-ci Bot added the qe-approved Signifies that QE has signed off on this PR label Oct 27, 2021
@RickJWagner
Copy link
Copy Markdown

/label px-approved

@openshift-ci openshift-ci Bot added the px-approved Signifies that Product Support has signed off on this PR label Oct 27, 2021
@zonggen
Copy link
Copy Markdown
Author

zonggen commented Oct 27, 2021

/unhold

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 27, 2021
@zonggen
Copy link
Copy Markdown
Author

zonggen commented Oct 27, 2021

/retest

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 5087780 into openshift:master Oct 28, 2021
@spadgett spadgett added this to the v4.10 milestone Dec 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. component/backend Related to backend docs-approved Signifies that Docs has signed off on this PR kind/dependency-change Categorizes issue or PR as related to changing dependencies lgtm Indicates that a PR is ready to be merged. px-approved Signifies that Product Support has signed off on this PR qe-approved Signifies that QE has signed off on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.